Multimodal Speaker Identification using Adaptive Decision Fusion with Reliability Weighted Summation
نویسندگان
چکیده
We present a multimodal open-set speaker identification system that integrates information coming from audio, face and lip motion modalities. For fusion of multiple modalities, the so called product rule with a novel adaptive reliability based weighting structure is employed. The proposed adaptive product rule is more robust in the presence of unreliable modalities, provided that the employed reliability measure is effective in assessment of classifier decisions. The proposed reliability measure, that genuinely fits to the open-set speaker identification problem, is used to assess more robust accept and reject decisions. Experimental results that support this assertion are provided.
منابع مشابه
Adaptive classifier cascade for multimodal speaker identification
We present a multimodal open-set speaker identification system that integrates information coming from audio, face and lip motion modalities. For fusion of multiple modalities, we propose a new adaptive cascade rule that favors reliable modality combinations through a cascade of classifiers. The order of the classifiers in the cascade is adaptively determined based on the reliability of each mo...
متن کاملMultimodal speaker/speech recognition using lip motion, lip texture and audio
We present a new multimodal speaker/speech recognition system that integrates audio, lip texture and lip motion modalities. Fusion of audio and face texture modalities has been investigated in the literature before. The emphasis of this work is to investigate the benefits of inclusion of lip motion modality for two distinct cases: speaker and speech recognition. The audio modality is represente...
متن کاملDiscrimination Analysis of Lip Motion Features for Multimodal Speaker Identification and Speech-reading
In this thesis a new multimodal speaker/speech recognition system that integrates audio, lip texture, lip geometry, and lip motion modalities is presented. There have been several studies that jointly use audio, lip intensity and/or lip geometry information for speaker identification and speech-reading applications. This work proposes using explicit lip motion information, instead of or in addi...
متن کاملChapter 16 JOINT AUDIO - VIDEO PROCESSING FOR ROBUST BIOMETRIC SPEAKER IDENTIFICATION IN CAR 1
In this chapter, we present our recent results on the multilevel Bayesian decision fusion scheme for multimodal audio-visual speaker identification problem. The objective is to improve the recognition performance over conventional decision fusion schemes. The proposed system decomposes the information existing in a video stream into three components: speech, lip trace and face texture. Lip trac...
متن کاملFuzzy logic decision fusion in a multimodal biometric system
This paper presents a multi-biometric verification system that combines speaker verification, fingerprint verification with face identification. Their respective equal error rates (EER) are 4.3%, 5.1% and the range of (5.1% to 11.5%) for matched conditions in facial image capture. Fusion of the three by majority voting gave a relative improvement of 48% over speaker verification (i.e. the best-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004